Policy-Gradients for PSRs and POMDPs
نویسندگان
چکیده
In uncertain and partially observable environments control policies must be a function of the complete history of actions and observations. Rather than present an ever growing history to a learner, we instead track sufficient statistics of the history and map those to a control policy. The mapping has typically been done using dynamic programming, requiring large amounts of memory. We present a general approach to mapping sufficient statistics directly to control policies by combining the tracking of sufficient statistics with the use of policy-gradient reinforcement learning. The best known sufficient statistic is the belief state, computed from a known or estimated partially observable Markov decision process (POMDP) model. More recently, predictive state representations (PSRs) have emerged as a potentially compact model of partially observable systems. Our experiments explore the usefulness of both of these sufficient statistics, exact and estimated, in direct policy-search.
منابع مشابه
Hilbert Space Embeddings of PSRs
Many problems in machine learning and artificial intelligence involve discrete-time partially observable nonlinear dynamical systems. If the observations are discrete, then Hidden Markov Models (HMMs) (Rabiner, 1989) or, in the control setting, Partially Observable Markov Decision Processes (POMDPs) (Sondik, 1971) can be used to represent belief as a discrete distribution over latent states. Pr...
متن کاملPlanning in Models that Combine Memory with Predictive Representations of State
Models of dynamical systems based on predictive state representations (PSRs) use predictions of future observations as their representation of state. A main departure from traditional models such as partially observable Markov decision processes (POMDPs) is that the PSR-model state is composed entirely of observable quantities. PSRs have recently been extended to a class of models called memory...
متن کاملPredictive State Representations: A New Theory for Modeling Dynamical Systems
Modeling dynamical systems, both for control purposes and to make predictions about their behavior, is ubiquitous in science and engineering. Predictive state representations (PSRs) are a recently introduced class of models for discrete-time dynamical systems. The key idea behind PSRs and the closely related OOMs (Jaeger’s observable operator models) is to represent the state of the system as a...
متن کاملA Survey of Predictive State Representations
Predictive State Representations (PSRs) [10] are a model for a discrete-time finite action and observation stochastic systems, presented as an alternative to HMMs and POMDPs. A PSR represents the system’s state as a set of predictions of the observable outcomes of tests performed in the system. Unlike hidden variable models no latent variables are assumed or required – only observable outcomes ...
متن کاملPlanning in Decentralized POMDPs with Predictive Policy Representations
We discuss the problem of policy representation in stochastic and partially observable systems, and address the case where the policy is a hidden parameter of the planning problem. We propose an adaptation of the Predictive State Representations (PSRs) to this problem by introducing tests (sequences of actions and observations) on policies. The new model, called the Predictive Policy Representa...
متن کامل